api/Mail: tokenise search input to align with EGroupware syntax#240
Merged
ralfbecker merged 1 commit intoMay 14, 2026
Merged
Conversation
Adds two protected static helpers — buildTokenizedSearch() and parseSearchTokens() — and rewrites the SUBJECT/FROM/TO/CC/BCC/BODY/TEXT branches of createIMAPFilter() to tokenise multi-word input and combine per-token Horde sub-queries with andSearch()/orSearch() and negation. Brings the Mail search behaviour in line with the documented EGroupware syntax already used in Addressbook, Calendar and InfoLog: foo bar -> contains foo OR contains bar (default) foo +bar -> contains foo AND contains bar (required) foo -bar -> contains foo AND NOT contains bar (forbidden) foo or bar -> contains foo OR contains bar foo and bar -> contains foo AND contains bar "foo bar" -> literal phrase as single token (preserves legacy behaviour) Single-word input produces byte-identical IMAP queries to the previous implementation. Multi-word input previously tried to match a literal contiguous substring (which almost never succeeded); now it applies the documented A B = A or B rule. Users who rely on contiguous match can wrap the value in double quotes. The patch leverages the existing Horde_Imap_Client_Search_Query primitives (andSearch / orSearch / negation flag) that the codebase already uses in the QUICK/QUICKWITHCC branches, so there is no new dependency. Forum discussion: https://help.egroupware.org/t/79137
4 tasks
Member
|
Thx for this pull-request :) Ralf |
CActor
added a commit
to CActor/egroupware
that referenced
this pull request
May 15, 2026
The original tokenisation patch (EGroupware#240) covered the SUBJECT/FROM/TO/CC/BCC/ BODY/TEXT case branches of createIMAPFilter(), but left the multi-header "Quick" search (case BYDATE / QUICK / QUICKWITHCC at L2251) untouched — those still called headerText() directly with the raw user string, so multi-word queries with '+token', '-token', '"phrase"' or AND/OR operators were sent to IMAP as a single literal substring, defeating the user-facing syntax everywhere except the dedicated per-field modes. This commit applies the same buildTokenizedSearch() helper from EGroupware#240 to that case branch. For each token from parseSearchTokens(): - positive token : (SUBJECT OR FROM/TO [OR CC]) contains term - negative token : (SUBJECT AND FROM/TO [AND CC]) does NOT contain term - tokens are combined by buildTokenizedSearch() with the operator precedence already validated for the other case branches Legacy single-token queries produce IMAP queries semantically equivalent to the previous code path — no regression for users who just type one word in the Quick search box. Multi-word inputs now behave like the documented EGroupware search syntax used everywhere else in the app. Tested: - Single-token Quick search ('fattura') -> matches subject/from/to as before - Multi-token AND ('+fattura +dicembre') -> only mails with both terms - Multi-token NOT ('+fattura -spam') -> excludes mails containing spam - Multi-token OR ('fattura ricevuta') -> mails with either term - Quoted phrase ('"fattura di dicembre"') -> contiguous-substring match - QUICKWITHCC variant adds CC to the headers visited per token Forum discussion: https://help.egroupware.org/t/79137/19 Companion to: EGroupware#240, EGroupware#241
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Aligns the Mail module search-box behaviour with the documented EGroupware search syntax used in Addressbook / Calendar / InfoLog. Multi-word queries now match messages whose terms can appear in any order and with arbitrary text between them.
Problem this solves
Documented in detail in the forum thread (linked below). The Mail search field is the only one in EGroupware that forwards the entire input string to the IMAP server as a single substring match. A query like
invoice overduebecomesSEARCH TEXT "invoice overdue", so a message body saying "the invoice from 02/2026 is overdue" is not matched, even though both words are clearly present.Even with
fts_flatcurveenabled on Dovecot, the phrase is matched as a contiguous bigram of tokens, so non-adjacent words remain a hard miss.Solution
Two new
protected statichelpers added toMailclass:buildTokenizedSearch($string, callable $factory): ?Horde_Imap_Client_Search_QueryparseSearchTokens($string): arrayThe relevant case branches (
SUBJECT,FROM,TO,CC,BCC,BODY,TEXT) increateIMAPFilter()now tokenise the input and call the factory for each token, then combine the sub-queries withHorde_Imap_Client_Search_Query::andSearch()/orSearch()/ negation according to the EGroupware-standard operator on each token.User-facing syntax (matches Addressbook/Calendar/InfoLog):
invoiceinvoice overdueinvoice and overdueinvoice +overdueinvoice -spam"invoice overdue"The resulting IMAP query for
invoice +overduewith TEXT search type is(BODY "invoice") (BODY "overdue")— two index lookups intersected by Dovecot (sub-millisecond with Flatcurve, two linear scans without).A single-token input produces byte-identical output to the historical generator — no regression on existing user behaviour where they typed one word.
Files changed
api/src/Mail.php— ~130 lines of effective deltaTest plan
fts_flatcurve-— verifiedAND NOTsemantics"foo bar"— verified historical contiguous match preservedbuildTokenizedSearch()andparseSearchTokens()— happy to add as part of reviewBackward compatibility
Fully preserved for single-word inputs. Multi-word inputs change semantics: previously they tried to match a literal contiguous substring (which almost never succeeded), now they apply the documented
A B = A or Brule. Users who specifically relied on contiguous match (rare) can now wrap the value in double quotes to preserve the old semantics.Related